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5 CROSS REFERENCE TO RELATED APPLICATION 
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BACKGROUND OF THE INVENTION 

10 The present invention relates to vector quantization (VQ) in speech 

coding systems using waveform interpolation. 

In recent years, there has been increasing interest in achieving toll- 
quality speech coding at rates of 4 kbps and below. Currently, there is an 
ongoing 4 kbps standardization effort conducted by an international 

15 standards body (The International Telecommunications Union- 
Telecommunication (ITU-T) Standardization Sector). The expanding 
variety of emerging applications for speech coding, such as third 
generation wireless networks and Low Earth Orbit (LEO) systems, is 
motivating increased research efforts. The speech quality produced by 

20 waveform coders such as code-excited linear prediction (CELP) coders 
degrades rapidly at rates below 5 kbps; see B. S. Atal, and M. R. 
Schroeder, (1984) "Stochastic Coding of Speech at Very Low Bit Rate", 
Proc. Int. Conf. Comm, Amsterdam, pp. 1610-1613. 
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On the other hand, parametric coders, such as: the waveform- 
interpolative (Wl) coder, the sinusoidal-transform coder (STC), and the 
multiband-excitation (MBE) coder, produce good quality at low rates but 
they do not achieve toll quality; see Y. Shoham, IEEE ICASSP'93, Vol. II, 
5 pp. 167-170 (1993); I. S. Burnett, and R. J. Holbeche, (1993), IEEE 

ICASSP'93, Vol. II, pp. 175-178; W. B. Kleijn, (1993), IEEE Trans. Speech 
and Audio Processing, Vol. 1, No. 4, pp. 386-399; W. B. Kleijn, and J. 
Haagen, (1994), IEEE Signal Processing Letters, Vol. 1, No. 9, pp. 136- 
138; W. B. Kleijn, and J. Haagen, (1995), IEEE ICASSP'95, pp. 508-511; 

H 10 W. B. Kleijn, and J. Haagen, (1995), in Speech Coding Synthesis by W. B. 

H Kleijn and K. K. Paliwal, Elsevier Science B. V., Chapter 5, pp. 175-207; I. 

U S. Burnett, and G. J. Bradley, (1995), IEEE ICASSP'95, pp. 261-263, 

r m 

% 4 1995; I. S. Burnett, and G. J. Bradley, (1995), IEEE Workshop on Speech 

S3 Coding for Telecommunications, pp. 23-24; I. S. Burnett, and D. H. Pham, 

•J 15 (1997), IEEE ICASSP'97, pp. 1567-1570; W. B. Kleijn, Y. Shoham, D. Sen, 
[J and R. Haagen, (1996), \EEE ICASSP'96, pp. 212-215; Y. Shoham, 

(1997), IEEE ICASSP'97, pp. 1599-1602; Y. Shoham, (1999), International 
Journal of Speech Technology, Kluwer Academic Publishers, pp. 329-341; 
R. J. McAulay, and T. F. Quatieri, (1995),/A7 Speech Coding Synthesis by 
20 W. B. Kleijn and K. K. Paliwal, Elsevier Science B. V., Chapter 4, pp. 121- 
173; and D. Griffin, and J. S. Lim, (1988), IEEE Trans. ASSP, Vol. 36, No. 
8, pp. 1223-1235. This is largely due to the lack of robustness of speech 
parameter estimation, which is commonly done in open-loop, and to 
inadequate modeling of non-stationary speech segments. 
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Commonly in Wl coding, the similarity between successive rapidly 
evolving waveform (REW) magnitudes is exploited by downsampling and 
interpolation and by constrained bit allocation; see W. B. Kleijn, and J. 
Haagen, (1995), IEEE ICASSP'95, pp. 508-511. In a previous Enhanced 
5 Waveform Interpolate (EWI) coder the REW magnitude was quantized 
on a waveform by waveform base; see O. Gottesman and A. Gersho, 
(1999), "Enhanced Waveform Interpolate Coding at 4 kbps", IEEE 
Speech Coding Workshop, pp. 90-92, Finland; Finland. O. Gottesman and 
A. Gersho, (1999),"Enhanced Analysis-by-Synthesis Waveform 
10 Interpolate Coding at 4 kbps", EUROSPEECH'99, pp. 1443-1446, 
Hungary. 

SUMMARY OF THE INVENTION 

The present invention describes novel methods that enhance the 
15 performance of the Wl coder, and allows for better coding efficiency 
improving on the above 1999 Gottesman and Gersho procedure. The 
present invention incorporates analysis-by-synthesis (AbS) for parameter 
estimation, offers higher temporal and spectral resolution for the REW, 
and more efficient quantization of the slowly-evolving waveform (SEW). In 
20 particular, the present invention proposes a novel efficient parametric 
representation of the REW magnitude, an efficient paradigm for AbS 
predictive VQ of the REW parameter sequence, and dual-predictive AbS 
quantization of the SEW. 
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More particularly, the invention provides a method for interpolate 
coding input signals, the signals decomposed into or composed of a slowly 
evolving waveform and a rapidly evolving waveform having a magnitude, 
the method incorporating at least one various, preferably combinations of 
the following steps or can include all of the steps: 

(a) AbSVQoftheREW; 

(b) parametrizing the magnitude of the REW; 

(c) incorporating temporal weighting in the AbS VQ of the REW; 

(d) incorporating spectral weighting in the AbS VQ of the REW; 

(e) applying a filter to a vector quantizer codebook in the 
analysis-by-synthesis vector-quantization of the rapidly evolving waveform 
whereby to add self correlation to the codebook vectors; and 

(f) using a coder in which a plurality of bits therein are allocated 
to the rapidly evolving waveform magnitude. 

In addition, one can combine AbS quantization of the slowly 
evolving waveform with any or all of the foregoing parameters. 

The new method achieves a substantial reduction in the REW bit 
rate and the EWI achieves very close to toll quality, at least under clean 
speech conditions. These and other features, aspects, and advantages of 
the present invention will become better understood with regard to the 
following detailed description, appended claims, and accompanying 
drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1 is a REW Parametric Representation; 
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Figure 2 is a REW Parametric VQ; 

Figure 3 is a REW Parametric Representation AbS VQ; 

Figure 4 is a REW Parametric Representation Simplified AbS VQ; 

Figure 5 is a REW Parametric Representation Simplified Weighted AbS 

VQ; 

Figure 6 is a block diagram of the Dual Predictive AbS SEW vector 
quantization; 

Figure 7 is a weighted Signal-to-Noise Ratio (SNR) for Dual Predictive 
AbS SEW VQ; 

Figure 8 is an output Weighted SNR for the 18 codebooks, 9-bit AbS SEW 
VQ; 

Figure 9 is a mean-removed SEW's Weighted SNR for the 18 codebooks, 
9-bit AbS SEW VQ; and 

Figure 10 are predictors for three REW parameter ranges. 

DETAILED DESCRIPTION 

In very low bit rate Wl coding, the relation between the SEW and 
the REW magnitudes was exploited by computing the magnitude of one as 
the unity complement of the other; see W. B. Kleijn, and J. Haagen, 
(1995), "A Speech Coder Based on Decomposition of Characteristic 
Waveforms", IEEE ICASSP'95, pp. 508-511; W. B. Kleijn, and J. Haagen, 
(1995), "Waveform Interpolation for Coding and Synthesis", in Speech 
Coding Synthesis by W. B. Kleijn and K. K. Paliwal, Elsevier Science 6. V. , 
Chapter 5, pp. 175-207; I. S. Burnett, and G. J. Bradley, (1995), "New 
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Techniques for Multi-Prototype Waveform Coding at 2.84 kb/s", IEEE 
ICASSP'95, pp. 261-263, 1995; I. S. Burnett, and G. J. Bradley, (1995), 
"Low Complexity Decomposition and Coding of Prototype Waveforms", 
IEEE Workshop on Speech Coding for Telecommunications, pp. 23-24; I. 
5 S. Burnett, and D. H. Pham, (1997), "Multi-Prototype Waveform Coding 
using Frame-by-Frame Analysis-by-Synthesis", IEEE ICASSP'97, pp. 
1567-1570; W. B. Kleijn, Y. Shoham, D. Sen, and R. Haagen, (1996), "A 
Low-Complexity Waveform Interpolation Coder", IEEE ICASSP'96, pp. 
212-215; Y. Shoham, (1997), "Very Low Complexity Interpolate Speech 

10 Coding at 1.2 to 2.4 kbps", IEEE ICASSP'97, pp. 1599-1602; Y. Shoham, 
(1999), "Low-Complexity Speech Coding at 1.2 to 2.4 kbps Based on 
Waveform Interpolation", International Journal of Speech Technology, 
Kluwer Academic Publishers, pp. 329-341. 

Also, since the sequence of SEW magnitude evolves slowly, 

15 successive SEWs exhibit similarity, offering opportunities for redundancy 
removal. Additional forms of redundancy that may be exploited for coding 
efficiency are: (a) for a fixed SEW/REW decomposition filter, the mean 
SEW magnitude increases with the pitch period and (b) the similarity 
between successive SEWs, also increases with the pitch period. In this 

20 work we introduce a novel "dual-predictive" AbS paradigm for quantizing 
the SEW magnitude that optimally exploits the information about the 
current quantized REW, the past quantized SEW, and the pitch, in order to 
predict the current SEW. 
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Introduction to REW Quantization 

The REW represents the rapidly changing unvoiced attribute of 
speech. Commonly in Wl systems, the REW is quantized on a waveform 
by waveform base. Hence, for low rate Wl systems having long frame size, 
5 and a large number of waveforms per frame, the relative bitrate required 
for the REW becomes significantly excessive. For example, consider a 
potential 2 kbps system which uses a 240 sample frame, 12 waveforms 
per frame, and which quantizes the SEW by alternating bit allocation of 3 
bit and 1 bit per waveform. The REW bitrate is then 24 bit per frame, or 

*3 10 800 kbps which is 40% of the total bitrate. This example demonstrates the 

il need for a more efficient REW quantization. 

M Efficient REW quantization can benefit from two observations: (1) 

" J the REW magnitude is typically an increasing function of the frequency, 

)i which suggests that an efficient parametric representation may be used; 

^ 15 (2) one can observe a similarity between successive REW magnitude 
il spectra, which may suggest a potential gain by employing predictive VQ 

on a group of adjacent REWs. The next two sections propose REW 

parametric representation, and its respective VQ. 

20 REW Parametric Representation 

Direct quantization of the REW magnitude is a variable dimension 
quantization problem, which may result in spending bits and computational 
effort on pe/ceptually irrelevant information. A simple and practical way to 
obtain a reduced, and fixed, dimension representation of the REW is with 
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a linear combination of basis functions, such as orthonormal polynomials; 
see W. B. Kleijn, Y. Shoham, D. Sen, and R. Haagen, (1996), IEEE 
ICASSP'96, pp. 212-215; Y. Shoham, (1997), IEEE ICASSP'97 t pp. 1599- 
1602; Y. Shoham, (1999), International Journal of Speech Technology, 
5 Kluwer Academic Publishers, pp. 329-341. Such a representation usually 
produces a smoother REW magnitude, and improves the perceptual 
quality. Suppose the REW magnitude, R(o>), is represented by a linear 

combination of orthonormal functions, i//,{co): 
j-i 

R(a>) = Y d y i V i (G>) , 0<co<7t (1) 

/=o 

10 where co is the angular frequency, and / is the representation order. The 
REW magnitude is typically an increasing function of frequency, which, 
can be coarsely quantized with a low number of bits per waveform without 
significant perceptual degradation. Therefore, it may be advantageous to 
represent the REW magnitude in a simple, but perceptually relevant 

15 manner. Consequently we model the REW by the following parametric 
representation, R(co,%)\ 

^(fii,« = £f l «M(fl>) , 0<a><x ;0<<T<1 (2) 

where y(£) = T ' s a parametric vector of coefficients within 

20 the representation model subspace, and £ is the "unvoicing" parameter 
which is zero for a fully voiced spectrum, and one for a fully unvoiced 
spectrum. Thus R(co,%) defines a two-dimensional surface whose cross 
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sections for each value of £give a particular REW magnitude spectrum, 
which is defined merely by specifying a scalar parameter value. 

A simple and practical way for parametric representation of the 
REW is, for example, by a parametric linear combination of basis 
5 functions, such as polynomials with parametric coefficients, namely: 



= 2>,(f)a>' , 0<co<7r ;0<£<1 (3) 



i=0 



For practical considerations assume that the parametric representation is 
10 a piecewise linear function of £ and may therefore be represented by a set 
of N uniformly spaced spectra, as illustrated in FIGURE 1. 

REW Parametric Vector Quantization 

One can observe the similarity between successive REW 
11 15 magnitude spectra, which may suggest a potential gain by VQ of a set of 
successive REWs. Figure 2 illustrates a simple parametric VQ system for 
a vector of REW spectra. The input is an M dimensional vector of REW 
magnitude spectra, 

20 R(o>) = [R^co\R 2 {co\.^R M {co)f (4) 

and the VQ output is an index, y, which determines a quantized parameter 
vector, | : 
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$ = [^ 2 ,... 9 Z M \ (5) 

which parametrically determines a vector of quantized spectra: 

5 ha>) = R{a>& = [&&^ (6) 

The encoder searches, in the parameter codebook C Q (£), for the 
parameter vector which minimizes the distortion: 

£3 

[I 10 | = argmm{f;Z>(tf w ,£(^ (7) 
= For example, suppose the input REW magnitude is represented by an l-th 

is? 

yj dimensional vector of function coefficients, y, given by: 

: : 
s sr 

For a set of M input REWs, each is of which represented by a vector of 
polynomial coefficients, y m , which form a PxM input coefficient matrix, T: 

20 r = [ yi ,y 23 ..., YA ,] (9) 
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The inverse VQ output is a vector of M quantized REWs, which form the 
quantized function coefficient matrix: 



5 which is used by the decoder to compute the quantized spectra. 
A. Quantization Using Orthonormal Functions 
Orthonormal functions, such as polynomials, may be used for 
efficient quantization of the REW; see W. B. Kleijn, et al., (1996), IEEE 
ICASSP'96, pp. 212-215; Y. Shoham, (1997), \EEE ICASSP'97, pp. 1599- 



10 1602; Y. Shoham, (1999), International Journal of Speech Technology, 
Kluwer Academic Publishers, pp. 329-341. Consider REW magnitude, 
Rico), represented by a linear combination of orthonormal functions, y/ioS)\ 



r(^) = [Y(^) 5 Y(^),.-,7(^)] 



(10) 



0<Q}<7T 



(11) 
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which is modeled using the parametric representation: 



0<to<n 



0<£<1 



(12) 



20 



The quantized REW parameter is then given by: 




(13) 
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In VQ case, the quantized parameter vector is given by: 



| = argmin jf;^,^.))} = argmin{]T|| Y/w - y(U(\ (14) 



B. Piecewise Linear Parametric Representation 
In order to have a simple representation that is computationally 
• efficient and avoids excessive memory requirements, we model the two 
dimensional surface by a piecewise linear parametric representation. 

10 Therefore, we introduce a set of N uniformly spaced spectra, {R(<v,g n )}"~ { 
Then the parametric surface is defined by linear interpolation according t 



15 



20 



i 

l«=0 



Because this representation is linear, the coefficients of R(co 9 <^) are linear 
combinations of the coefficients of R(co,% n _ x ) and R(co,% n ). Hence, 

y(^) = (l-a)Y n _ 1 +ay n (16) 

where yjs the coefficient vector of the n-th REW magnitude function 
representation: 
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Y. =?(#») ( 17 > 

In this case, the distortion may be interpolated by: 

"z>(M(£)))=j^(©)-0-ff^ 

0 

5 (18) 
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The above can be easily generalized to the parameter VQ case. The 
optimal interpolation factor that minimizes the distortion between two 
representation vectors is given by: 



_ (Y/i 7 ' n-\) (V 7 n-\) (A<\\ 



and the respective optimal parameter value, which is a continuous variable 
between zero and one, is given by: 



%{y) = {\-a opt )Z n _ x +a op £ n (20) 



This result allows a rapid search for the best unvoicing parameter value 
needed to transform the coefficient vector to a scalar parameter, followed 
20 by the corresponding quantization scheme, as described in the section 4. 

C. Weighted Distortion Quantization 
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Commonly in speech coding, the magnitude is quantized using 
weighted distortion measure. In this case the quantized REW parameter 
is then given by: 

| = argminJ |k(<y) - R(co, £)f W(co)dco \ (21 ) 

*^«> [ 0 JI J 

and the orthonormal function simplification, given in equation (13), cannot 
be used. In this case, the weighted distortion between the input and the 
parametric representation modeled spectra is equal to: 

D w (r 9 R(gj) = \r(o) - R(a> 9 £)f W(co)dc> = (y - y(£)) r <F(^(o>)Xy - y(f )) (22) 

o 

where *¥(W(a>)) is the weighted correlation matrix of the orthonormal 
functions, its elements are: 

%j{W(a>j) = Iw^w^y^coylcD , (23) 

0 

Y is the input coefficient vectors, and y(£) is the modeled parametric 
coefficient vector. In VQ case, the quantized parameter vector is given by: 
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, = argmin{f;^ 



D. Weighted Distortion - Piecewise Linear Parametric 
5 Representation 

Again, for practical considerations assume that the parametric 
representation is piecewise linear, and may be represented by a set of N 

spectra, {r(j& 9 £„)}"^ . For the piecewise linear representation, the 
interpolated quantized coefficient vector is: 

10 

Y(^) = (l-«)y n _ 1+ «y„ ; ; ^=y Z ^ ± - (25) 

In the case where parameter VQ is employed, the interpolation allows for a 
substantial simplification of the search computations. In this case, the 
15 distortion can be interpolated: 



D w (R, R{&) = (y - (1 - a)y -off J ^{W{coj%y - (1 - a)y n _ x -cryj 

= y Tl Vy + (1 - afyJVy n _ x + off „ T Vy n - 2(1 - a)y r - 2<ry r Yy „ + 2a(\ - a)y n _ x *Py n 

(26) 

Note that no benefit is obtained here by using orthonormal functions, 
therefore any function representation may be used. The above can be 
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easily generalized to the parameter VQ case. The optimal parameter that 
minimizes the spectrally weighted distortion between two representation 
vectors is given by: 



„ = (Y„-Y„-i) ri P(Y-Y,,-i) (27 \ 

opt (y„ -Y„.) r *(Y„ -y„_,) 1 ' 



and the respective optimal parameter value, which is a continuous variable 
between zero and one, is given by equation (20). This result allows a 
rapid search for the best unvoicing parameter value needed to transform 
10 the coefficient vector to a scalar parameter, for encoding or for VQ design. 
Alternatively, in order to eliminate using the matrix \|/, the scalar product 
may redefined to incorporate the time-varying spectral weighting. The 
respective orthonormal basis functions then satisfy: 

1 5 j>F(<y)tf , (<»)Vj (co)dco = 5{i - j) (28) 

0 

where 5{i-j) denotes Kroneker delta. The respective parameter vector 
is given by: 

20 y = jw(co)R(a>)\v(co)deo (29) 
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where \|/(<y) = [^ 0 ,^ ,f M ] T is an l-th dimensional vector of time-varying 

orthonormal functions. 

REW Parameter Analvsis-Bv-Svnthesis VQ 

This section presents the AbS VQ paradigm for the REW 

parameter. The first presentation is a system which quantizes the REW 

parameter by employing spectral based AbS. Then simplified systems, 

which apply AbS to the REW parameter, are presented. 

A. REW Parameter Quantization by Magnitude AbS VQ 
The novel Analysis-by-Synthesis (AbS) REW parameter VQ 

technique is illustrated in FIGURE 3. An excitation vector c u (m) 

(m= 1;... ,M) is selected from the VQ codebook and is fed through a 
synthesis filter to obtain a parameter vector £<» (synthesized quantized) 
which is then mapped to quantized a representation coefficient vectors 
y(i(m)). This is compared with a sequence of input representation 
coefficient vectors y(m) and each is spectrally weighted. Each spectrally 
weighted error is then temporally weighted, and a distortion measure is 
obtained. A search through all candidate excitation vectors determines an 
optimal choice. The synthesis filter in FIGURE 3 can be viewed as a first 
order predictor in a feedback loop. (While shown here is an auto- 
regressive synthesis filter, in other arrangements moving-average (MA) 
synthesis filter may be used.) By allowing the value of the predictor 
parameter P to change, it becomes a "switched-predictor" scheme. 
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Switched-prediction is introduced to allow for different levels of REW 
parameter correlation. 

The scheme incorporates both spectral weighting and temporal 
weighting. The spectral weighting is used for the distortion between each 
5 pair of input and the quantized spectra. In order to improve SEW/REW 
mixing, particularly in mixed voiced and unvoiced speech segments, and 
to increase speech crispness, especially for plosives and onsets, temporal 
weighting is incorporated in the AbS REW VQ. The temporal weighting is 
a monotonic function of the temporal gain. Two codebooks are used, and 
s 2 10 each codebook has an associated predictor coefficient, Pi and P 2 . The 
il .quantization target is an M-dimensional vector of REW spectra. Each 

u REW spectrum is represented by a vector of basis function coefficients 

s 4 denoted by y(m). The search for the minimal WMSE is performed over all 

H the vectors, c 0 (m) , of the two codebooks for /=1 , 2. The quantized REW 

La 

m 15 function coefficients vector, y(|(/w)), is a function of the quantized 

•* parameter £(m) , which is obtained by passing the quantized vector, 

c /y (w) , through the synthesis filter. The weighted distortion between each 

pair of input and quantized REW spectra is calculated. The total distortion 
is a temporally-weighted sum of the M spectrally weighted distortions. 
20 Since the predictor coefficients are known, direct VQ can be used to 
simplify the computations. For a piecewise linear parametric REW 
representation, a substantial simplification of the search computations may 
be obtained by interpolating the distortion between the representation 
spectra set, as explained in sections 3.B. and 3.D. 
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A sequence of quantized parameter, such as c(k) , is formed by 



concatenating successive quantized vectors, such as . The 



quantized parameter is computed recursively by: 



5 £(*) = /><*)£(*-!) + £(*) 



(30) 



where k is the time index of the coded waveform. 
B. Simplified REW Parameter AbS VQ 

The above scheme maps each quantized parameter to coefficient 
10 vector, which is used to compute the spectral distortion. To reduce 
complexity, such mapping, and spectral distortion computation, which 
contribute to the complexity of the scheme, may be eliminated by using the 
simplified scheme described below. For a high rate, and a smooth 
representation surface R(co^) , the total distortion is equal to the sum of 



20 The quantization distortion is related to the quantized parameter by: 



15 modeling distortion and quantization distortion: 




(31) 
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jr £>„(*(£(/»)), *(£(«))) = £ (y - 7 T(^(w))(y («m)) - y (|(m))) 




(32) 



which, for the piecewise linear representation case, is equal to 



m=l 



-Y-.«(«))X«™)"««)) 2 



(33) 

which is linearly related to the REW parameter squared quantization error, 
\£(m)-%(m)) and, therefore, justifies direct VQ of the REW parameter. 



B.1. Simplified REW Parameter AbS VQ - Non Weighted 
Distortion 

FIGURE 4 illustrates a simplified AbS VQ for the REW parametric 
representation. The encoder maps the REW magnitude to an unvoicing 
REW parameter, and then quantizes the parameter by AbS VQ. Initially, 
the magnitudes of the M REWs in the frame are mapped to coefficient 

vectors, {y(/w)}^ =1 - Then, for each coefficient vector, a search is performed 

to find the optimal representation parameter, ^(y), using equation (20), to 
form an /W-dimensional parameter vector for the current frame, 
{^(y(m))}^ =1 . Finally, the parameter vector is encoded by AbS VQ. The 
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decoded spectra, {R((vJ(m))J^ , are obtained from the quantized parameter 

vector, {f(/w))^ =l , using equation (15). This scheme allows for higher 

temporal, as well as spectral REW resolution, compared to the common 
method described in W.B. Kleijn, et al, IEEE ICASSP'95, pp.508-511 
(1995), since no downsampling is performed, and the continuous 
parameter is vector quantized in AbS. 

B.2. Simplified REW Parameter AbS VQ -Weighted Distortion 
The simplified quantization scheme is improved to incorporate 
spectral and temporal weightings, as illustrated in Figure 5. The REW 
parameter vector is first mapped to REW parameter by minimizing a 
distortion, which is weighted by the coefficient spectral weighting matrix x ¥ t 
as described in section 3.D. Then, the resulted REW parameter is used to 
compute a weighting, Wsi^m)), which we choose to be the spectral 
sensitivity to the REW parameter squared quantization error, 

(^(w)-l(m)) 2 , given by: 




For the piecewise linear representation case, using equation (33), the 
following equation is obtained: 





(34) 
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Wo 



(35) 



The above derivative can be easily computed off line. Additionally, a 
5 temporal weighting, in form of monotonic function of the gain, denoted by 
Wt(g(m)), is used to give relatively large weight to waveforms with larger 
gain values. The AbS REW parameter quantization is computed by 
minimizing the combined spectrally and temporally weighted distortion: 

f? 10 £>({^(/n)}lp^)L) = f t w,<JS(m))y V &W)igW-£wJ (36) 

m=l 

= St 

I* The weighted distortion scheme improves the reconstructed speech 

s " quality, most notably in mixed voiced and unvoiced speech segments. This 

[ j may be explained by an improvement in REW/SEW mixing. 



15 Dual Predictive AbS SEW Quantization 

Figure 6 illustrates a Dual Predictive SEW AbS VQ scheme which 
uses two observables, (a) the quantized REW, and (b) the past quantized 
SEW, to jointly predict the current SEW. Although we refer to the operator 
on each observable as a "predictor", in fact both are components of a 

20 single optimized estimator. The SEW and the REW are complex random 
vectors, and their sum is a residual vector having elements whose 
magnitudes have a mean value of unity. In low bit-rate Wl coding, the 
relation between the SEW and the REW magnitudes was approximated by 

25010134.1 

22 



computing the magnitude of one as the unity complement of the other. 
Suppose |r M | denotes the spectral magnitude vector of the last quantized 
REW in the current frame. An "implied" SEW vector, is calculated by: 



* M , implied 



= l-f, 



M I 



(37) 



and from which the mean vector is removed. Vectors whose means are 
removed are denoted with an apostrophe. Then, a (mean-removed) 



estimated "implied" SEW magnitude vector, | s Mjm P iied\, is computed using a 
10 diagonal estimation matrix p^, 



M, implied 



= P 



REW 



M , implied 



(38) 



Additionally, a "self-predicted" SEW vector is computed by multiplying the 
15 delayed quantized SEW vector, |s'o|, by a diagonal prediction matrix p SEfV . 
The predicted (mean-removed) SEW vector, | S V|, is given by: 



M X REW 



M, implied 



^*SEW P 0 



(39) 



20 The quantized vector, c M , is determined by an AbS search according to: 
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where W w is the diagonal spectral weighting matrix; see O. Gottesman, 
(1999), IEEE ICASSP'99, vol. 1:269-272; O. Gottesman and A. Gersho, 
5 (1999), IEEE Speech Coding Workshop, pp. 90-92, Finland; O. Gottesman 
and A. Gersho,(1999), EUROSPEECH'99, pp. 1443-1446, Hungary. The 
(mean-removed) quantized SEW magnitude, |s' M | , is the sum of the 
predicted SEW vector, \s' M \, and the codevector c M : 

-= a. 

'%*£ 

SV| + C M (41) 

In order to exploit the information about the pitch and voicing level, 
the possible pitch range was partitioned into six subintervals, and the REW 
parameter range into three. Also, eighteen codebooks were generated, 
15 one for each pair of pitch range and unvoicing range. Each codebook has 
associated two mean vectors, and two diagonal prediction matrices. To 
improve the coder robustness and the synthesis smoothness, the cluster 
used for the training of each codebook overlaps with those of the 
codebooks for neighboring ranges. Since each quantized target vector 
20 may have a different value of the removed mean, the quantized mean is 
added temporarily to the filter memory after the state update, and the next 
quantized vector's mean is subtracted from it before filtering is performed. 



10 
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The output weighted SNR, and the mean-removed weighted SNR, 
of the scheme are illustrated in Figure 7. Evidently, a very high SNR is 
achieved with a relatively small number of bits. The weighted SNR of 
each codebook, for the 9-bit case, is illustrated in Figure 8. The 
5 differences in SNR between three REW parameter ranges is dominated by 
the different means. The respective mean-removed weighted SNR of 
each codebook is illustrated in FIGURE 9. Within each voicing range the 
differences in SNR between each pitch range are mainly due to the 
number of bit per vector sample, which decreases as the number of 

10 harmonics increases, and to the prediction gain. 

Examples for the two predictors for three REW parameter ranges 
are illustrated in Figure 10. For voiced segment the SEW predictor is 
dominant, whereas the REW predictor is less important since its input 
variations in this range are very small. As the voicing decreases, the SEW 

15 predictor decreases, and the REW predictor becomes more dominant at 
the lower part of the spectrum. Both predictors decrease as the voicing 
decreases from the intermediate range to the unvoiced range. 

Bit Allocation 

20 The bit allocation for the 2.8 kbps EWI coder is given in Table 1 . 

The frame length is 20 ms, and ten waveforms are extracted per frame. 
The line spectral frequencies (LSFs) are coded using predictive MSVQ, 
having two stages of 10 bit each, a 2-bit increase compared to the past 
version of our code; see O. Gottesman and A. Gersho, (1999), IEEE 
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# 



Speech Coding Workshop, pp. 90-92, Finland; O. Gottesman and A. 
Gersho,(1999), EUROSPEECH'99, pp. 1443-1446, Hungary. The 70-th 
dimensional log-gain vector is quantized using 9 bit AbS VQ; The pitch is 
coded twice per frame. A fixed SEW phase was trained for each one of 
5 the eighteen pitch-voicing ranges; see O. Gottesman, (1999), IEEE 
ICASSP'99, vol. 1:269-272. 



Parameter 


Bits / Frame 


Bits / second 


LPC 


20 


1000 


Pitch 


2x6 = 12 


600 


Gain 


9 


450 


SEW magnitude 


8 


400 


REW magnitude 


7 


350 


Total 


56 


2800 



Li » 

Lj 

M Table 1 

10 Subjective Results 

A subjective A/B test was conducted to compare the 2.8 kbps EWI 
coder of this invention to G.723.1. The test data included 24 modified 
intermediate reference system (M-IRS) filtered speech sentences, 12 of 
which are of female speakers, and 12 of male speakers; see ITU-T, 

15 (1996), "Recommendation P.830, Subjective Performance Assessment of 
Telephone Band and Wideband Digital Codecs", Annex D, ITU, Geneva. 
Twelve listeners participated in the test. The test results, listed in Table 2 
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and Table 3, indicate that the subjective quality of the 2.8 kbps EWI 
exceeds that of G.723.1 at 5.3 kbps, and it is slightly better than that of 
G. 723.1 at 6.3 kbps. The EWI preference is higher for male than for 
female speakers. 

5 



Test 


2.8 kbps 
Wl 


5.3 kbps 
G.723.1 


No 

Preference 


Female 


40.28% 


33.33% 


26.39% 


Male 


48.61% 


24.31% 


27.08% 


Total 


44.44% 


28.82% 


26.74% 



?= Table 2 

u 

I j Table 2 shows the results of subjective A/B test for comparison between 

lj the 2.8 kbps EWI coder to 5.3 kbps G.723.1 . With 95% certainty the result 

il lies within +/-5.53%. 

in. 

Q 10 

s- — 



Test 


2.8 kbps 
Wl 


6.3 kbps 
G.723.1 


No 

Preference 


Female 


38.19% 


36.81% 


25.00% 


Male 


43.06% 


31.94% 


25.00% 


Total 


40.63% 


34.38% 


25.00% 



Table 3 
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Table 3 shows the results of subjective A/B test for comparison between 
the 2.8 kbps EWI coder to 6.3 kbps G. 723.1 . With 95% certainty the result 
lies within +/-5.59%. 

It should, of course, be noted that while the present invention has been 
5 described in terms of an illustrative embodiment, other arrangements will 
be apparent to those of ordinary skills in the art. For example; 

1. While in the disclosed embodiment in FIGURE 3 have described 
auto-regressive (AR) synthesis filter, in other arrangements moving- 
average (MA) filter may be used. 
10 2. While in the disclosed embodiment was related to waveform 
Y1 interpolative speech coding, in other arrangements it may be used in other 

U coding schemes. 

3. While in the disclosed embodiment temporal weighting, and/or 
C3 spectral weighting are described, they are optional, and in other 

-Z 15 arrangements any or both of them may not be used. 

4. While in the disclosed embodiment switch prediction having two 
predictors is described, in other arrangements no switch, or more than two 
predictor choice may be used. 

5. While in the disclosed embodiment illustrated in Figure 6 mean 
20 vectors are subtracted from the vector, this may be viewed as optional, 

and in other arrangements any or all of such mean vectors may not be 
used. 

6. While in the disclosed embodiment the pitch range and/or the 
voicing parameter values were partitioned into subranges, and codebooks 
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were used for each subrange, this may be viewed as optional, and in other 
arrangements any or all of such subranges may not be used, or other 
number or type of subranges may be used. 

7. While in the disclosed embodiment describes prediction matrices 
5 were diagonal, in other arrangements non diagonal prediction matrices 
may be used. 
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